On the Dangers of Cross-Validation. An Experimental Evaluation

نویسندگان

  • R. Bharat Rao
  • Glenn Fung
چکیده

Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potentially, helping to protect against overfitting. Improvements in computational power, recent reductions in the (computational) cost of classification algorithms, and the development of closed-form solutions (for performing cross validation in certain classes of learning algorithms) makes it possible to test thousand or millions of variants of learning models on the data. Thus, it is now possible to calculate cross validation performance on a much larger number of tuned models than would have been possible otherwise. However, we empirically show how under such large number of models the risk for overfitting increases and the performance estimated by cross validation is no longer an effective estimate of generalization; hence, this paper provides an empirical reminder of the dangers of cross validation. We use a closed-form solution that makes this evaluation possible for the cross validation problem of interest. In addition, through extensive experiments we expose and discuss the effects of the overuse/misuse of cross validation in various aspects, including model selection, feature selection, and data dimensionality. This is illustrated on synthetic, benchmark, and real-world data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining optimal value of the shape parameter $c$ in RBF for unequal distances topographical points by Cross-Validation algorithm

Several radial basis function based methods contain a free shape parameter which has  a crucial role in the accuracy of the methods. Performance evaluation of this parameter in different  functions with various data has always been a topic of study. In the present paper, we consider studying the methods which determine an optimal value for the shape parameter in interpolations of radial basis  ...

متن کامل

Validation of the Early Feeding Skills Assessment Scale for the Evaluation of Oral Feeding in Premature Infants

Background: Feeding difficulties are common and important in premature infants. In order to identify neonatal feeding difficulties, clinicians and nurses require assessment tools to conduct an objective evaluation of infant oral feeding (breast/bottle-feeding). Early identification of infants with feeding difficulty is critical to implement appropriate therapies and op...

متن کامل

Development and Validation of Attitude toward Gestational Surrogacy Scale in Iranian Infertile Couples

Objective Surrogacy is one of the most challenging infertility treatments engaging ethical, psychological and social issues. Attitudes survey plays an important role to disclosure variant aspects of surrogacy, to help meeting legislative gaps and ambiguities, and to convert controversial dimensions surrounding surrogacy to a normative concept that eliminates stigma. The aim of this study is to ...

متن کامل

Synthesis and Experimental-Modelling Evaluation of Nanoparticles Movements by Novel Surfactant on Water Injection: An Approach on Mechanical Formation Damage Control and Pore Size Distribution

Water injection is used as a widespread IOR/EOR method and promising formation damages (especially mechanical ones) is a crucial challenge in the near-wellbore of injection wells. The magnesium oxide (MgO) NanoParticles (NPs) considered in the article underwater flooding experiment tests to monitor the promising mechanical formation damage (size exclusion) in lab mechanistic scale include m...

متن کامل

The Development and Validation of New Equations for Prediction of the Performance of Tangential Cyclones

New equations have been developed to predict the effect of geometrical dimensions of tangential cyclones on their operational performances. To check the validity of the derived equations, an experimental apparatus was set up and some experimental work was performed. It was observed that the experimental results confirm properly the theoretical predictions.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008